Harmonise CURF with AES 2022

Author
Affiliation

University of Sydney

Published

10:17PM 3 June 2023

1 CURF

The CURF is an anonymised selection of microdata from the Australian Census of Population and “Housing”, utilizing a random 1% of person-level records from the Census. At the time of writing data from the 2021 Census is not yet available; we use the 2016 “1% Basic CURF”.

We perform a one-time read the CURF from csv and re-write a local copy as fst for fast-loading:

Load fst version of the CURF file:

1.1 Geo-codes and covariates

The CURF geocodes individuals in groupings of SA4s. We build a data set of these groupings and the corresponding SA4s.

1.2 Cross walk from AES geocodes (SA2s and postcodes) to CURF SA4

AES supplies postcodes for all respondents (at least in the restricted access version of the file) and some SA1 information.

Join SA2 to SA4s.

1.3 CURF recodes

Variable labels and recodes, drawing on XLS file accompanying CURF:

Save the version of the CURF we have at this stage:

1.4 Marginal distributions of variables likely appearing the AES:

2 SA4 covariates

The AES is sparse with respect to both SA4s and the agglomerations of SA4s used to geo-code individual records in the CURF. ABS provides data on SA4s from the 2021 Census which we download and combine into a data set of covariates elsewhere, in code/sa4_features.R.

Here we aggregate the SA4 level covariates up to the agglomerations in the CURF denoted with AREAENUM.

3 AES

We read the 2022 AES. These data are supplied in stata format with accompanying attributes such as var.labels. We recode reported House of Representatives vote to hvote_collapsed.

3.1 Occupation codes

We pick up ANZSCO codes from the public release of the data:

Rows: 2508 Columns: 368
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (33): ID, A12_Order, A13_Order, B1_other, B9_1_other, B9_2_other, C2_O...
dbl  (334): ID_2019, ID_2016, Mode, STATE, sampsrc, accesstype, PANEL_FLAG, ...
date   (1): IntDate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3.2 Vote recode

Coalition Greens Independent Labor Other
Liberal 800 0 0 0 0
Labor 0 0 0 917 0
National Party 75 0 0 0 0
Greens 0 348 0 0 0
Other party (please specify) 0 0 0 0 6
No party 0 0 61 0 0
Australian Democrats 0 0 0 0 0
Christian Democratic Party 0 0 0 0 0
Citizens Electoral Council 0 0 0 0 0
Family First Party 0 0 0 0 2
Pauline Hanson's One Nation 0 0 0 0 30
Republican Party (replaced by Republican Party of Australia) 0 0 0 0 1
Shooters, Fishers and Farmers Party 0 0 0 0 2
Fishing Party 0 0 0 0 1
United Australia Party (formerly Palmer's United Party) 0 0 0 0 20
Katter's Australia Party 0 0 0 0 1
Liberal Democrats 0 0 0 0 11
Motoring Enthusiasts Party 0 0 0 0 0
Australian Sports Party (dissolved in 2015) 0 0 0 0 0
Reason Party (formerly The Australian Sex Party) 0 0 0 0 1
The Wikileaks Party (dissolved in 2015) 0 0 0 0 0
Australian Christians 0 0 0 0 2
Derryn Hinch's Justice Party 0 0 0 0 1
Centre Alliance (formerly Nick Xenophon Team) 0 0 0 0 2
Rise Up Australia 0 0 0 0 0
Science Party 0 0 0 0 0
Australian Liberty Alliance 0 0 0 0 0
Pirate Party 0 0 0 0 0
Jacquie Lambie Network 0 0 0 0 0
Arts Party 0 0 0 0 0
Animal Justice Party 0 0 0 0 4
Australian Cyclists Party 0 0 0 0 0
Health Australia Party 0 0 0 0 0
Affordable Housing Party 0 0 0 0 0
Australia First Party 0 0 0 0 1
Australian Better Families 0 0 0 0 0
Australian Conservatives 0 0 0 0 0
Australian People's Party 0 0 0 0 0
Australian Progressives 0 0 0 0 0
Australian Workers Party 0 0 0 0 0
Child Protection Party 0 0 0 0 0
Climate Action! Immigration Action! Accountable Politicians! 0 0 0 0 0
Country Liberals (NT) 0 0 0 0 0
Democratic Labour Party 0 0 0 0 1
Fraser Anning'S Conservative National Party 0 0 0 0 0
Help End Marijuana Prohibition (HEMP) Party 0 0 0 0 0
Independents For Climate Action Now 0 0 0 0 0
Involuntary Medication Objectors (Vaccination/Fluoride) Party 0 0 0 0 0
Labour DLP 0 0 0 0 0
Liberal National Party of Queensland 0 0 0 0 0
Love Australia or Leave 0 0 0 0 0
Non-Custodial Parents Party (Equal Parenting) 0 0 0 0 0
Secular Party of Australia 0 0 0 0 0
Seniors United Party of Australia 0 0 0 0 0
Socialist Alliance 0 0 0 0 1
Socialist Equality Party 0 0 0 0 0
Sustainable Australia 0 0 0 0 0
The Australian Mental Health Party 0 0 0 0 0
The Great Australian Party 0 0 0 0 3
The Small Business Party 0 0 0 0 0
The Together Party 0 0 0 0 0
Victorian Socialists 0 0 0 0 0
VOTEFLUX.ORG | Upgrade Democracy! 0 0 0 0 0
WESTERN AUSTRALIA PARTY 0 0 0 0 1
Yellow Vest Australia 0 0 0 0 0
Australian Citizens Party 0 0 0 0 0
Australian Federation Party 0 0 0 0 2
Australian Values Party 0 0 0 0 0
David Pocock 0 0 0 0 1
Drew Pavlou Democratic Alliance 0 0 0 0 0
FUSION: Science, Pirate, Secular, Climate Emergency 0 0 0 0 2
Federal ICAC Now 0 0 0 0 0
Indigenous - Aboriginal Party of Australia 0 0 0 0 1
Informed Medical Options Party 0 0 0 0 1
Rex Patrick Team 0 0 0 0 0
TNL 0 0 0 0 0
The Local Party of Australia 0 0 0 0 0
Swing Voter 0 0 0 0 0
Independent 0 0 129 0 0
Other party (not specified) 0 0 0 0 7
Does not apply 0 0 0 0 0
Item skipped 0 0 0 0 0

3.3 2019 vote

3.4 Collapse religion

3.5 Miscellaneous others, recoded to match CURF

3.6 Missing data codes

Recode Item skipped and Unknown to NA.

3.7 Geo-coding

Several tasks here:

  • Validate match AES postcodes.
  • Lookup SA4s of AES postcodes.
  • Merge Census POA (postcode) variables and principal components onto AES.

3.7.1 Are there unmatched PCODE (postcodes) in the AES data?

PCODE
0871
1640
8002
6872
3724
1355
0861
2001
6956

We’ll manually hard code these:

3.7.2 Add SA4 agglomerations AREAENUM geocodes to match CURF

3.7.3 POA covariates

We add POA-level Census variables, or rather, the 1st 10 principal components extracted from thousands of POA level Census variables provided by ABS in POA General Community Profiles (GCP); see the analysis code/poa_features.R for details.

We also add various rates and medians provided in Table G02 of the ABS POA GCP files.

3.8 Political characteristics of House seats

Create some additional CED-level covariates:

3.9 Save the copy of d2022 we have at this stage

3.10 Searchable listing of variables in the AES:

3.11 AES variables with CURF analogues

We revisit this list as we re-code AES vars to match CURF coding. The general idea will be to create a variable in the AES data with the same name and coding scheme as its CURF analogue.

AES marginal:

CURF marginals:

4 Check alignment between AES and CURF

We now examine the alignment between AES and CURF categories. We restrict this analysis of the CURF to records from adult Australian residents. Likewise we discard observations from

4.1 Drop unused levels on INCP and AGEP

4.2 Make AGEP continuous for imputations

Imputing continuous quantity easier than discrete. We will convert back to discrete after imputations:

4.3 Add AREAENUM level covariates

4.4 Write to file

4.5 Compute summary stats for inspection